What are the major differences between Python and R for data science?

 

Both Python and R have vast software ecosystems and communities, so either language is suitable for almost any data science task. That said, there are some areas in which one is stronger than the other.

Where Python Excels

·         The majority of deep learning research is done in Python, so tools such as Keras and PyTorch have "Python-first" development. You can learn about these topics in Introduction to Deep Learning in Keras and Introduction to Deep Learning in PyTorch.

·         Another area where Python has an edge over R is in deploying models to other pieces of software. Python is a general purpose programming language, so if you write an application in Python, the process of including your Python-based model is seamless. We cover deploying models in Designing Machine Learning Workflows in Python and Building Data Engineering Pipelines in Python.

·         Python is often praised for being a general-purpose language with an easy-to-understand syntax

Where R Excels

·         A lot of statistical modeling research is conducted in R, so there's a wider variety of model types to choose from. If you regularly have questions about the best way to model data, R is the better option. DataCamp has a large selection of courses on statistics with R.

·         The other big trick up R's sleeve is easy dashboard creation using Shiny. This enables people without much technical experience to create and publish dashboards to share with their colleagues. Python does have Dash as an alternative, but it’s not as mature. You can learn about Shiny in our course on Building Web Applications with Shiny in R.

·         R's functionality was developed with statisticians in mind, thereby giving it field-specific advantages such as great features for data visualization.

This list is far from exhaustive and experts endlessly debate which tasks can be done better in one language or another. Further, Python programmers and R programmers tend to borrow good ideas from each other. For example, Python's plotnine data visualization package was inspired by R's ggplot2 package, and R's rvest web scraping package was inspired by Python's BeautifulSoup package. So eventually, the best ideas from either language find their way into the other making both languages similarly useful & valuable.

If you’re too impatient to wait for a particular feature in your language of choice, it's also worth noting that there is excellent language interoperability between Python and R. That is, you can run R code from Python using the rpy2 package, and you can run Python code from R using reticulate. That means that all the features present in one language can be accessed from the other language. For example, the R version of deep learning package Keras actually calls Python. Likewise, rTorch calls PyTorch.

Beyond features, the languages are sometimes used by different teams or individuals based on their backgrounds.

Who Uses Python

·         Python was originally developed as a programming language for software development (the data science tools were added later), so people with a computer science or software development background might feel more comfortable using it.

·         Accordingly, transition from other popular programming languages like Java or C++ to Python is easier than the transition from those languages to R.

Who Uses R

·         R has a set of packages known as the Tidyverse, which provide powerful yet easy-to-learn tools for importing, manipulating, visualizing, and reporting on data. Using these tools, people without any programming or data science experience (at least anecdotally) can become productive more quickly than in Python.

·         If you want to test this for yourself, try taking Introduction to the Tidyverse, which introduces R's dplyr and ggplot2 packages. It will likely be easier to pick up on than Introduction to Data Science in Python, but why not see for yourself what you prefer?

·         Overall, if you or your employees don't have a data science or programming background, R might make more sense.

Wrapping up, though it may be hard to know whether to use Python or R for data analysis, both are great options. One language isn’t better than the other—it all depends on your use case and the questions you’re trying to answer. Finally, I’ll share the first bit of this a handy infographic comparing the two languages. I don’t want to include it all as it’s very long and would require too much scrolling, but you can download the full image here.

https://qph.fs.quoracdn.net/main-qimg-cd43a3af500105fc6354549189cff912

References

https://www.quora.com/What-are-the-major-differences-between-Python-and-R-for-data-scienceDear